AITopics | kernel density estimation

Collaborating Authors

kernel density estimation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fast Approximation of Similarity Graphs with Kernel Density Estimation

Neural Information Processing SystemsDec-26-2025, 21:29:16 GMT

Constructing a similarity graph from a set $X$ of data points in $ \mathbb{R}^d$ is the first step of many modern clustering algorithms. However, typical constructions of a similarity graph have high time complexity, and a quadratic space dependency with respect to $|X|$. We address this limitation and present a new algorithmic framework that constructs a sparse approximation of the fully connected similarity graph while preserving its cluster structure. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions. We compare our designed algorithm with the well-known implementations from the scikit-learn library and the FAISS library, and find that our method significantly outperforms the implementation from both libraries on a variety of datasets.

fast approximation, kernel density estimation, similarity graph, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

A Fair Classifier Using Kernel Density Estimation

Neural Information Processing SystemsDec-24-2025, 10:38:24 GMT

As machine learning becomes prevalent in a widening array of sensitive applications such as job hiring and criminal justice, one critical aspect that machine learning classifiers should respect is to ensure fairness: guaranteeing the irrelevancy of a prediction output to sensitive attributes such as gender and race. In this work, we develop a kernel density estimation trick to quantify fairness measures that capture the degree of the irrelevancy. A key feature of our approach is that quantified fairness measures can be expressed as differentiable functions w.r.t.

fair classifier, kernel density estimation, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unsupervised Learning of Density Estimates with Topological Optimization

Tanweer, Sunia, Khasawneh, Firas A.

arXiv.org Machine LearningDec-17-2025

Kernel density estimation is a key component of a wide variety of algorithms in machine learning, Bayesian inference, stochastic dynamics and signal processing. However, the unsupervised density estimation technique requires tuning a crucial hyperparameter: the kernel bandwidth. The choice of bandwidth is critical as it controls the bias-variance trade-off by over- or under-smoothing the topological features. Topological data analysis provides methods to mathematically quantify topological characteristics, such as connected components, loops, voids et cetera, even in high dimensions where visualization of density estimates is impossible. In this paper, we propose an unsupervised learning approach using a topology-based loss function for the automated and unsupervised selection of the optimal bandwidth and benchmark it against classical techniques -- demonstrating its potential across different dimensions.

bandwidth, density estimation, dimension, (15 more...)

arXiv.org Machine Learning

2512.08895

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression

Enwerem, Clinton, Puranic, Aniruddh G., Baras, John S., Belta, Calin

arXiv.org Artificial IntelligenceDec-9-2025

Mainstream approximate action-value iteration reinforcement learning (RL) algorithms suffer from overestimation bias, leading to suboptimal policies in high-variance stochastic environments. Quantile-based action-value iteration methods reduce this bias by learning a distribution of the expected cost-to-go using quantile regression. However, ensuring that the learned policy satisfies safety constraints remains a challenge when these constraints are not explicitly integrated into the RL framework. Existing methods often require complex neural architectures or manual tradeoffs due to combined cost functions. To address this, we propose a risk-regularized quantile-based algorithm integrating Conditional Value-at-Risk (CVaR) to enforce safety without complex architectures. We also provide theoretical guarantees on the contraction properties of the risk-sensitive distributional Bellman operator in Wasserstein space, ensuring convergence to a unique cost distribution. Simulations of a mobile robot in a dynamic reach-avoid task show that our approach leads to more goal successes, fewer collisions, and better safety-performance trade-offs than risk-neutral methods.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2506.06954

Country: North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Aerospace & Defense (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

Danait, Ved, Das, Srijan, Bhore, Sujoy

arXiv.org Machine LearningOct-28-2025

Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) are fundamental problems at the core of modern machine learning, with broad applications in data analysis, information systems, and large-scale decision making. In massive and dynamic data streams, a central challenge is to design compact sketches that preserve essential structural properties of the data while enabling efficient queries. In this work, we develop new sketching algorithms that achieve sublinear space and query time guarantees for both ANN and A-KDE for a dynamic stream of data. For ANN in the streaming model, under natural assumptions, we design a sublinear sketch that requires only $\mathcal{O}(n^{1+ρ-η})$ memory by storing only a sublinear ($n^{-η}$) fraction of the total inputs, where $ρ$ is a parameter of the LSH family, and $0<η<1$. Our method supports sublinear query time, batch queries, and extends to the more general Turnstile model. While earlier works have focused on Exact NN, this is the first result on ANN that achieves near-optimal trade-offs between memory size and approximation error. Next, for A-KDE in the Sliding-Window model, we propose a sketch of size $\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+ε} - 1} \log^2 N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, and $ε$ is the approximation error. This, to the best of our knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the Sliding-Window model. We complement our theoretical results with experiments on various real-world datasets, which show that the proposed sketches are lightweight and achieve consistently low error in practice.

artificial intelligence, machine learning, probability, (15 more...)

arXiv.org Machine Learning

2510.23039

Country:

Asia > India > Maharashtra > Mumbai (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.62)

Add feedback

Space and Time Efficient Kernel Density Estimation in High Dimensions

Arturs Backurs, Piotr Indyk, Tal Wagner

Neural Information Processing SystemsOct-3-2025, 09:27:30 GMT

However, their data structure requires a significantly increased super-linear storage space, as well as super-linear preprocessing time.

algorithm, dataset, kernel, (14 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.42)

Add feedback

Efficient Identification of High Similarity Clusters in Polygon Datasets

Daras, John N.

arXiv.org Artificial IntelligenceSep-30-2025

Abstract--Advancements in tools like Shapely 2.0 and Triton can significantly improve the efficiency of spatial similarity computations by enabling faster and more scalable geometric operations [1], [2]. However, for extremely large datasets, these optimizations may face challenges due to the sheer volume of computations required. T o address this, we propose a framework that reduces the number of clusters requiring verification, thereby decreasing the computational load on these systems. The framework integrates dynamic similarity index thresholding, supervised scheduling [3], and recall-constrained optimization to efficiently identify clusters with the highest spatial similarity while meeting user-defined precision and recall requirements [4]. By leveraging Kernel Density Estimation (KDE) to dynamically determine similarity thresholds [5] and machine learning models to prioritize clusters, our approach achieves substantial reductions in computational cost without sacrificing accuracy. Experimental results demonstrate the scalability and effectiveness of the method, offering a practical solution for large-scale geospatial analysis. Geospatial data constitutes the cornerstone of numerous applications across various domains, including urban planning, environmental monitoring, infrastructure development, and medicine. For example, OpenStreetMap contains global data amounting to over 1.5 terabytes [6], while GeoNames describes more than 12 million locations, providing extensive point geometries such as latitude and longitude [7]. Expanding these datasets, geospatial knowledge graphs like Y AGO2geo integrate millions of lines, polygons, and multi-polygons from OpenStreetMap and administrative divisions [8], while WorldKG represents around 113.4 million geographic entities [9]. KnowWhereGraph, a more recent initiative, comprises over 12 billion RDF triples, including data on polygons and multipolygons, and supports applications in disaster relief, agricultural land use, and food-related supply chains [10]. Even cross-domain knowledge graphs such as DBpedia and Wikidata incorporate a substantial amount of geospatial information, underscoring the critical role of spatial data on the Web. Beyond these well-known repositories, spatial datasets also play a transformative role in medicine, particularly in the analysis and modeling of organ structures. For instance, the Visible Human Project provides high-resolution spatial data for anatomical structures [11], while the Human Connectome Project captures detailed spatial relationships within the brain [12].

artificial intelligence, geometry, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.23942

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Global Optimization of Stochastic Black-Box Functions with Arbitrary Noise Distributions using Wilson Score Kernel Density Estimation

Iversen, Thorbjørn Mosekjær, Sørensen, Lars Carøe, Mathiesen, Simon Faarvang, Petersen, Henrik Gordon

arXiv.org Machine LearningSep-12-2025

Many optimization problems in robotics involve the optimization of time-expensive black-box functions, such as those involving complex simulations or evaluation of real-world experiments. Furthermore, these functions are often stochastic as repeated experiments are subject to unmeasurable disturbances. Bayesian optimization can be used to optimize such methods in an efficient manner by deploying a probabilistic function estimator to estimate with a given confidence so that regions of the search space can be pruned away. Consequently, the success of the Bayesian optimization depends on the function estimator's ability to provide informative confidence bounds. Existing function estimators require many function evaluations to infer the underlying confidence or depend on modeling of the disturbances. In this paper, it is shown that the confidence bounds provided by the Wilson Score Kernel Density Estimator (WS-KDE) are applicable as excellent bounds to any stochastic function with an output confined to the closed interval [0;1] regardless of the distribution of the output. This finding opens up the use of WS-KDE for stable global optimization on a wider range of cost functions. The properties of WS-KDE in the context of Bayesian optimization are demonstrated in simulation and applied to the problem of automated trap design for vibrational part feeders.

bayesian optimization, experiment, optimization, (12 more...)

arXiv.org Machine Learning

2509.09238

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Transportation > Air (0.61)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Contributions to Robust and Efficient Methods for Analysis of High Dimensional Data

Yang, Kai

arXiv.org Artificial IntelligenceSep-11-2025

A ubiquitous feature of data of our era is their extra-large sizes and dimensions. Analyzing such high-dimensional data poses significant challenges, since the feature dimension is often much larger than the sample size. This thesis introduces robust and computationally efficient methods to address several common challenges associated with high-dimensional data. In my first manuscript, I propose a coherent approach to variable screening that accommodates nonlinear associations. I develop a novel variable screening method that transcends traditional linear assumptions by leveraging mutual information, with an intended application in neuroimaging data. This approach allows for accurate identification of important variables by capturing nonlinear as well as linear relationships between the outcome and covariates. Building on this foundation, I develop new optimization methods for sparse estimation using nonconvex penalties in my second manuscript. These methods address notable challenges in current statistical computing practices, facilitating computationally efficient and robust analyses of complex datasets. The proposed method can be applied to a general class of optimization problems. In my third manuscript, I contribute to robust modeling of high-dimensional correlated observations by developing a mixed-effects model based on Tsallis power-law entropy maximization and discussed the theoretical properties of such distribution. This model surpasses the constraints of conventional Gaussian models by accommodating a broader class of distributions with enhanced robustness to outliers. Additionally, I develop a proximal nonlinear conjugate gradient algorithm that accelerates convergence while maintaining numerical stability, along with rigorous statistical properties for the proposed framework.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.08155

Country:

Europe (0.92)
North America > United States > New York (0.28)
North America > Canada > Quebec (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.45)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Mathematics of Computing (1.00)
Information Technology > Data Science > Data Mining (1.00)
(4 more...)

Add feedback

Kernel $k$-Medoids as General Vector Quantization

Gerlach, Thore, Mücke, Sascha, Bauckhage, Christian

arXiv.org Artificial IntelligenceSep-8-2025

Vector Quantization (VQ) is a widely used technique in machine learning and data compression, valued for its simplicity and interpretability. Among hard VQ methods, $k$-medoids clustering and Kernel Density Estimation (KDE) approaches represent two prominent yet seemingly unrelated paradigms -- one distance-based, the other rooted in probability density matching. In this paper, we investigate their connection through the lens of Quadratic Unconstrained Binary Optimization (QUBO). We compare a heuristic QUBO formulation for $k$-medoids, which balances centrality and diversity, with a principled QUBO derived from minimizing Maximum Mean Discrepancy in KDE-based VQ. Surprisingly, we show that the KDE-QUBO is a special case of the $k$-medoids-QUBO under mild assumptions on the kernel's feature map. This reveals a deeper structural relationship between these two approaches and provides new insight into the geometric interpretation of the weighting parameters used in QUBO formulations for VQ.

artificial intelligence, machine learning, qubo formulation, (13 more...)

arXiv.org Artificial Intelligence

2506.04786

Country: Europe > Germany > North Rhine-Westphalia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback